Fast & Confident Probabilistic Categorisation
نویسنده
چکیده
We describe NRC’s submission to the Anomaly Detection/Text Mining competition organised at the Text Mining Workshop 2007. This submission relies on a straightforward implementation of the probabilistic categoriser described in [4]. This categoriser is adapted to handle multiple labelling and a piecewise-linear confidence estimation layer is added to provide an estimate of the labelling confidence. This technique achieves a score of 1.689 on the test data.
منابع مشابه
A Probabilistic Model for Fast and Confident*
Permission is granted to quote short excerpts and to reproduce figures and tables from this report, provided that the source of such material is fully acknowledged.
متن کاملA Probabilistic Neighbourhood Translation Approach for Non-standard Text Categorisation
The need for non-standard text categorisation, i.e. based on some subtle criterion other than topics, may arise in various circumstances. In this study, we consider written responses to a standardised psychometric test for determining the personality trait of human subjects. A number of state-of-the-art text classifiers that having been very successful in standard topic-based classification pro...
متن کاملProbabilistic Models for Hierarchical Clustering and Categorisation: Applications in the Information Society
|We propose a new hierarchical generative model for textual data, where words may be generated by topic speciic distributions at any level in the hierarchy. This model is naturally well-suited to clustering documents in preset or automatically generated hierarchies, as well as cat-egorising new documents in an existing hierarchy. Furthermore , we present a series of applications that can beneet...
متن کاملAeóû Ø Ôôöøññòø Ó Óñôùøøö Ëëëëòòò¸éùùùò Ååöý ² Ï×ø¹ Ðð Óððððð¸íòòúö××øý Ó Äóòòóòº
The automatic categorisation of web documents is becoming crucial for organising the huge amount of information available in the Internet. We are facing a new challenge due to the fact that web documents have a rich structure and are highly heterogeneous. Two ways to respond to this challenge are (1) using a representation of the content of web documents that captures these two characteristics ...
متن کامل